Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: allow tests to specify a cockroach binary to use #113301

Merged
merged 3 commits into from
Nov 8, 2023

Conversation

DarrylWong
Copy link
Contributor

@DarrylWong DarrylWong commented Oct 30, 2023

Currently, roachtests must manually upload their own cockroach binary if needed through the Put API.
However, almost all roachtests upload the standard t.Cockroach() binary to ./cockroach on all nodes,
resulting in the same Put code being duplicated at the start of most tests. Additionally, to collect artifacts
we still need a cockroach binary at a discoverable path, leading to the same binary being copied twice in many
cases, see: #97814

This change adds a TestSpec option which lets tests specify a cockroach binary to use. If one is not specified,
we now upload the t.Cockroach() binary to ./cockroach. This lets us remove cockroach-default logic for artifacts,
and removes the need to manually upload binaries at the start of each test.

Release note: None
Fixes: #104729

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DarrylWong
Copy link
Contributor Author

DarrylWong commented Oct 30, 2023

Test Run

Looks good, two failures that have been fixed on master and one that still fails on master.

@DarrylWong DarrylWong force-pushed the refactor-put branch 7 times, most recently from 057d12b to cdd2b40 Compare November 3, 2023 15:39
@DarrylWong DarrylWong marked this pull request as ready for review November 6, 2023 15:21
@DarrylWong DarrylWong requested review from a team as code owners November 6, 2023 15:21
@DarrylWong DarrylWong requested review from srosenberg and renatolabs and removed request for a team November 6, 2023 15:21
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

This is fantastic, the double copy was very annoying. Nice job!! 👍

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/cluster.go line 1839 at r1 (raw file):

// runtime assertions enabled. Note that we upload to all nodes even if they
// don't use the binary, so that the test runner can always fetch logs.
func (c *clusterImpl) PutCockroach(ctx context.Context, l *logger.Logger, t *testImpl) error {

[nit] this feels like it belongs with testImpl rather than clusterImpl


pkg/cmd/roachtest/cluster.go line 1848 at r1 (raw file):

		return c.PutE(ctx, l, t.RuntimeAssertionsCockroach(), test.DefaultCockroachPath, c.All())
	default:
		return errors.Errorf("Specified cockroach binary does not exist.")

[nit] error is confusing, sounds like a binary is missing, maybe "unknown CockroachBinary value %v"


pkg/cmd/roachtest/registry/test_spec.go line 434 at r1 (raw file):

// ClusterCockroachBinary is the cockroach binary that will be uploaded
// to every node in the cluster at the start of the test. We upload to
// every node so that we can fetch logs in the case of a failure.

[nit] It's important to mention the default value when it's not specified.

@renatolabs
Copy link
Contributor

Very exciting to see all these c.Put calls going away!

Nit: can we make the PR title match the commit?

Thought: should we make t.Cockroach() private? Is there still a legitimate use for it?

The thinking is that muscle memory plays a strong role in roachtest, and it's only a matter of time before this c.Put call is added back to new or existing tests. A compilation failure would be a pretty good indication that that should not be done anymore.

Doesn't have to be part of this PR, just wanted to ask the question.

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 111 of 111 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @RaduBerinde, and @srosenberg)


pkg/cmd/roachtest/cluster.go line 1839 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] this feels like it belongs with testImpl rather than clusterImpl

Not sure about that; this function is uploading a file to the cluster. Most other functions that do this live in the cluster level (e.g., PutLibraries, PutString, Put itself, Stage, etc.)

@DarrylWong DarrylWong changed the title Refactor put roachtest: allow tests to specify a cockroach binary to use Nov 7, 2023
Copy link
Contributor Author

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, just noticed a few t.Cockroach() slipped through because they call c.PutE. Will have to rerun those but since they just upload cockroach to every node I'm confident there shouldn't be any issue with them.

Nit: can we make the PR title match the commit?

Done.

should we make t.Cockroach() private? Is there still a legitimate use for it?

The autoupgrade and schemachange/mixed-versions still use it. The latter should be ported to the new mixed-version framework though so not sure if it will be still using it then. Besides those two, it's not being called anywhere else in roachtests so yes I think we could make it private if autoupgrade is addressed.

I didn't dig too deeply into it, but is there a reason why we can't port autoupgrade to the new framework? I see it has the warning at the top to not use it for mixed-version but it doesn't explain why.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/cluster.go line 1839 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Not sure about that; this function is uploading a file to the cluster. Most other functions that do this live in the cluster level (e.g., PutLibraries, PutString, Put itself, Stage, etc.)

I'm content with either location but yeah Renato's logic is why I originally had it in clusterImpl.


pkg/cmd/roachtest/registry/test_spec.go line 434 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] It's important to mention the default value when it's not specified.

Done but also moved it along with other parts of the comment to the TestSpec struct to better reflect how the other comments are.

@RaduBerinde
Copy link
Member

pkg/cmd/roachtest/cluster.go line 1839 at r1 (raw file):

Previously, DarrylWong wrote…

I'm content with either location but yeah Renato's logic is why I originally had it in clusterImpl.

Well.. this function is deciding which binary to upload and then calls Put to upload. Is the cluster or the test responsible for that decision?

More generally - should clusterImpl have to understand anything about testImpl? If it's not necessary, it shouldn't. I know there are a few uses of testImpl already but they are pretty small and easily fixable.

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @RaduBerinde, and @srosenberg)


pkg/cmd/roachtest/cluster.go line 1839 at r1 (raw file):

Previously, RaduBerinde wrote…

Well.. this function is deciding which binary to upload and then calls Put to upload. Is the cluster or the test responsible for that decision?

More generally - should clusterImpl have to understand anything about testImpl? If it's not necessary, it shouldn't. I know there are a few uses of testImpl already but they are pretty small and easily fixable.

IMO, there's no decision being made here. The cluster is reading the option chosen by the user, if any, and calling Put based on that. In case the choice is randomized, the actual choosing logic is in the testImpl.

@DarrylWong DarrylWong added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Nov 7, 2023
@RaduBerinde
Copy link
Member

pkg/cmd/roachtest/cluster.go line 1839 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

IMO, there's no decision being made here. The cluster is reading the option chosen by the user, if any, and calling Put based on that. In case the choice is randomized, the actual choosing logic is in the testImpl.

Fine with me either way (always the case with nits).

@DarrylWong
Copy link
Contributor Author

DarrylWong commented Nov 7, 2023

I messed up and used the wrong logger in this sanity TC run, but confirmed the cockroach random seed print works on gceworker after switching to the test logger.

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong and @srosenberg)

Currently, roachtests must manually upload their own cockroach binary if needed through the Put API.
However, almost all roachtests upload the standard t.Cockroach() binary to ./cockroach on all nodes,
resulting in the same Put code being duplicated at the start of most tests. Additionally, to collect artifacts
we still need a cockroach binary at a discoverable path, leading to the same binary being copied twice in many
cases, see: cockroachdb#97814

This change adds a TestSpec option which lets tests specify a cockroach binary to use. If one is not specified,
we now upload the t.Cockroach() binary to ./cockroach. This lets us remove cockroach-default logic for artifacts,
and removes the need to manually upload binaries at the start of each test.

Release note: None
Fixes: cockroachdb#104729
This change prints the cockroach random seed used by metamorphic
builds to test.log for ease of debugging. Before this seed was found
only in cockroach.log.
Before if a test uploaded a non t.Cockroach() binary github posting
would have no knowledge of that and naively use t.Cockroach() to
determine metamorphism or not. This fixes that.
@DarrylWong
Copy link
Contributor Author

DarrylWong commented Nov 8, 2023

Added a last second addition that fixes github label posting for tests that don't use t.Cockroach(). This is easy now that we have the test spec, before github posting had no knowledge of what tests decided to upload. I think it should be a very low impact change but happy to push it back to a followup PR. Otherwise I'll go ahead and ship it.

@renatolabs
Copy link
Contributor

Looks good, thanks @DarrylWong!

@DarrylWong
Copy link
Contributor Author

TFTRs!!!

bors r=renatolabs, RaduBerinde

@craig
Copy link
Contributor

craig bot commented Nov 8, 2023

Build succeeded:

@craig craig bot merged commit 46220d9 into cockroachdb:master Nov 8, 2023
8 checks passed
Copy link

blathers-crl bot commented Nov 8, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 009c53d to blathers/backport-release-23.1-113301: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from 009c53d to blathers/backport-release-23.2-113301: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@rafiss
Copy link
Collaborator

rafiss commented Nov 15, 2023

no need to rush, but i just wanted to check if this is getting backported to 23.1? if so, i will wait on it before merging #114197

@DarrylWong
Copy link
Contributor Author

Yes but the backport is dependent on #113894 being backported and the plan for that is to wait until we can shake out all the runtime assertion flakes. Given the amount of flakes we've seen so far, I don't think we're in a rush to but someone else feel free to correct me.

Maybe it's worth it to just stick a c.Put() in there for the backport if you think the additional few weeks of test coverage is worth it? I can always remove it as part of this backport and worst case it wouldn't break anything if forgotten about.

@rafiss
Copy link
Collaborator

rafiss commented Nov 16, 2023

Maybe it's worth it to just stick a c.Put() in there for the backport if you think the additional few weeks of test coverage is worth it?

sounds good! i will do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: allow tests to specify which cockroach binaries they need
5 participants